An Approximate Solution Method for Large Risk-Averse Markov Decision Processes
نویسندگان
چکیده
Stochastic domains often involve risk-averse decision makers. While recent work has focused on how to model risk in Markov decision processes using risk measures, it has not addressed the problem of solving large risk-averse formulations. In this paper, we propose and analyze a new method for solving large risk-averse MDPs with hybrid continuous-discrete state spaces and continuous action spaces. The proposed method iteratively improves a bound on the value function using a linearity structure of the MDP. We demonstrate the utility and properties of the method on a portfolio optimization problem.
منابع مشابه
Risk-averse dynamic programming for Markov decision processes
We introduce the concept of a Markov risk measure and we use it to formulate risk-averse control problems for two Markov decision models: a finite horizon model and a discounted infinite horizon model. For both models we derive risk-averse dynamic programming equations and a value iteration method. For the infinite horizon problem we also develop a risk-averse policy iteration method and we pro...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملOn terminating Markov decision processes with a risk-averse objective function
We consider a class of undiscounted terminating Markov decision processes with a risk-averse exponential objective function and compact constraint sets. After assuming the existence of an absorbing cost-free terminal state , positive transition costs away from , and continuity of the transition probability and cost functions, we establish (i) the existence of a real-valued optimal cost function...
متن کاملRisk-Averse Approximate Dynamic Programming with Quantile-Based Risk Measures
In this paper, we consider a finite-horizon Markov decision process (MDP) for which the objective at each stage is to minimize a quantile-based risk measure (QBRM) of the sequence of future costs; we call the overall objective a dynamic quantile-based risk measure (DQBRM). In particular, we consider optimizing dynamic risk measures where the one-step risk measures are QBRMs, a class of risk mea...
متن کاملA Heuristic Variable Grid Solution Method for POMDPs
Partially observable Markov decision processes (POMDPs) are an appealing tool for modeling planning problems under uncertainty. They incorporate stochastic action and sensor descriptions and easily capture goal oriented and process oriented tasks. Unfortunately, POMDPs are very difficult to solve. Exact methods cannot handle problems with much more than 10 states, so approximate methods must be...
متن کامل